## PC1 PC2 group condition replicate name
## WT1 -15.446124 6.985017 Control:1 Control 1 WT1
## WT2 -10.845993 13.371109 Control:2 Control 2 WT2
## WT3 -14.324961 4.010518 Control:3 Control 3 WT3
## Ab1 -8.697130 -8.343581 Light:1 Light 1 Ab1
## Ab2 -9.401742 -8.836389 Light:2 Light 2 Ab2
## Ab3 -11.008402 -7.357857 Light:3 Light 3 Ab3
## Wg1 18.418470 -1.204799 Wnt:1 Wnt 1 Wg1
## Wg2 7.851189 -10.885567 Wnt:2 Wnt 2 Wg2
## Wg3 13.544888 1.181744 Wnt:3 Wnt 3 Wg3
## Wg_Ab1 15.524041 5.254569 Wnt_Light:1 Wnt_Light 1 Wg_Ab1
## Wg_Ab2 14.385764 5.825236 Wnt_Light:2 Wnt_Light 2 Wg_Ab2
dds = dds[!(row.names(counts(dds)) %in% ensembl.genes$gene_id[ensembl.genes$gene_biotype %in% c("rRNA", "snoRNA", "snRNA")]),]
dds = dds[rowSums(counts(dds)) > 0,]
##
## chr2L chr2R chr3L chr3R chr4 chrX chrY chrM
## 2634 2929 2698 3362 80 2157 41 37
## PC1 PC2 group condition replicate name
## WT1 -15.446207 6.976894 Control:1 Control 1 WT1
## WT2 -10.803245 13.412923 Control:2 Control 2 WT2
## WT3 -14.326604 4.023965 Control:3 Control 3 WT3
## Ab1 -8.722458 -8.353524 Light:1 Light 1 Ab1
## Ab2 -9.403147 -8.828752 Light:2 Light 2 Ab2
## Ab3 -10.999116 -7.357691 Light:3 Light 3 Ab3
## Wg1 18.460587 -1.162145 Wnt:1 Wnt 1 Wg1
## Wg2 7.809082 -10.899188 Wnt:2 Wnt 2 Wg2
## Wg3 13.555007 1.189389 Wnt:3 Wnt 3 Wg3
## Wg_Ab1 15.495600 5.197507 Wnt_Light:1 Wnt_Light 1 Wg_Ab1
## Wg_Ab2 14.380501 5.800622 Wnt_Light:2 Wnt_Light 2 Wg_Ab2
## WT1 WT2 WT3 Ab1 Ab2 Ab3 Wg1 Wg2
## 1.0823335 0.8944507 0.8968479 1.5526588 0.9047126 1.2777171 0.9656761 0.9258778
## Wg3 Wg_Ab1 Wg_Ab2
## 1.0729766 0.8827814 0.7883241
## [1] 1
## [1] 2
## [1] 3
## [1] 2
## [1] 3
## [1] 1
## [1] 2
## [1] 3
## [1] 2
## [1] 3
## [1] 1
## [1] 2
## [1] 3
## [1] 2
## [1] 3
## [1] 1
## [1] 2
The genes with greater than 5^{5} normalised counts are:
| seqnames | start | end | width | strand | gene_id | gene_biotype | entrezgene_id | external_gene_name | |
|---|---|---|---|---|---|---|---|---|---|
| FBgn0000079 | chr2R | 17118636 | 17120303 | 1668 |
|
FBgn0000079 | protein_coding | 47764 | Amy-p |
| FBgn0003356 | chr3R | 29923754 | 29924615 | 862 |
|
FBgn0003356 | protein_coding | 43544 | Jon99Cii |
| FBgn0003357 | chr3R | 29922216 | 29923455 | 1240 |
|
FBgn0003357 | protein_coding | 43543 | Jon99Ciii |
| FBgn0003863 | chr2R | 11344260 | 11345119 | 860 |
|
FBgn0003863 | protein_coding | 48316 | alphaTry |
| FBgn0013674 | chrM | 1474 | 3009 | 1536 |
|
FBgn0013674 | protein_coding | 192469 | mt:CoI |
| FBgn0033774 | chr2R | 12762113 | 12763825 | 1713 |
|
FBgn0033774 | protein_coding | 36410 | CG12374 |
| FBgn0035665 | chr3L | 6050757 | 6051690 | 934 |
|
FBgn0035665 | protein_coding | 38683 | Jon65Aiii |
| FBgn0036024 | chr3L | 9641013 | 9641913 | 901 |
|
FBgn0036024 | protein_coding | 39125 | CG18180 |
| FBgn0040060 | chr3L | 6035216 | 6036175 | 960 |
|
FBgn0040060 | protein_coding | 38680 | yip7 |
| FBgn0250815 | chr3L | 6039155 | 6040135 | 981 |
|
FBgn0250815 | protein_coding | 38682 | Jon65Aiv |
These all seem to be protein-coding genes, so we will not remove them.
The genes with greater than 2^{4} TPM are:
| seqnames | start | end | width | strand | gene_id | gene_biotype | entrezgene_id | external_gene_name | |
|---|---|---|---|---|---|---|---|---|---|
| FBgn0002868 | chr3R | 9783407 | 9784370 | 964 |
|
FBgn0002868 | protein_coding | 41202 | MtnA |
| FBgn0003356 | chr3R | 29923754 | 29924615 | 862 |
|
FBgn0003356 | protein_coding | 43544 | Jon99Cii |
| FBgn0003863 | chr2R | 11344260 | 11345119 | 860 |
|
FBgn0003863 | protein_coding | 48316 | alphaTry |
| FBgn0004426 | chr3L | 1210485 | 1210726 | 242 |
|
FBgn0004426 | pseudogene | 38126 | LysC |
| FBgn0036024 | chr3L | 9641013 | 9641913 | 901 |
|
FBgn0036024 | protein_coding | 39125 | CG18180 |
| FBgn0040060 | chr3L | 6035216 | 6036175 | 960 |
|
FBgn0040060 | protein_coding | 38680 | yip7 |
| FBgn0040687 | chr3R | 4335098 | 4335506 | 409 |
|
FBgn0040687 | protein_coding | 50160 | CG14645 |
| FBgn0066084 | chr2R | 24903416 | 24903935 | 520 |
|
FBgn0066084 | protein_coding | 251466 | RpL41 |
| FBgn0250815 | chr3L | 6039155 | 6040135 | 981 |
|
FBgn0250815 | protein_coding | 38682 | Jon65Aiv |
Since there is a pseudogene with high TPM counts, we’ll remove this from the final dataset.
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] kableExtra_1.3.4 Biostrings_2.58.0
## [3] XVector_0.30.0 scales_1.1.1
## [5] reshape2_1.4.4 knitr_1.31
## [7] biomaRt_2.46.3 GenomicFeatures_1.42.3
## [9] AnnotationDbi_1.52.0 genefilter_1.72.1
## [11] ggplot2_3.3.3 DESeq2_1.30.1
## [13] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [15] MatrixGenerics_1.2.1 matrixStats_0.58.0
## [17] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
## [19] IRanges_2.24.1 S4Vectors_0.28.1
## [21] BiocGenerics_0.36.0
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-152 bitops_1.0-6 bit64_4.0.5
## [4] webshot_0.5.2 RColorBrewer_1.1-2 progress_1.2.2
## [7] httr_1.4.2 rprojroot_2.0.2 tools_4.0.5
## [10] bslib_0.2.4 utf8_1.2.1 R6_2.5.0
## [13] mgcv_1.8-34 DBI_1.1.1 colorspace_2.0-0
## [16] withr_2.4.1 tidyselect_1.1.0 prettyunits_1.1.1
## [19] bit_4.0.4 curl_4.3 compiler_4.0.5
## [22] rvest_1.0.0 xml2_1.3.2 DelayedArray_0.16.3
## [25] labeling_0.4.2 rtracklayer_1.50.0 sass_0.3.1
## [28] askpass_1.1 rappdirs_0.3.3 systemfonts_1.0.1
## [31] stringr_1.4.0 digest_0.6.27 Rsamtools_2.6.0
## [34] svglite_2.0.0 rmarkdown_2.7 pkgconfig_2.0.3
## [37] htmltools_0.5.1.1 highr_0.8 dbplyr_2.1.1
## [40] fastmap_1.1.0 rlang_0.4.10 rstudioapi_0.13
## [43] RSQLite_2.2.6 farver_2.1.0 jquerylib_0.1.3
## [46] generics_0.1.0 jsonlite_1.7.2 BiocParallel_1.24.1
## [49] dplyr_1.0.5 RCurl_1.98-1.3 magrittr_2.0.1
## [52] GenomeInfoDbData_1.2.4 Matrix_1.3-2 Rcpp_1.0.6
## [55] munsell_0.5.0 fansi_0.4.2 lifecycle_1.0.0
## [58] stringi_1.5.3 yaml_2.2.1 zlibbioc_1.36.0
## [61] plyr_1.8.6 BiocFileCache_1.14.0 grid_4.0.5
## [64] blob_1.2.1 crayon_1.4.1 lattice_0.20-41
## [67] splines_4.0.5 annotate_1.68.0 hms_1.0.0
## [70] locfit_1.5-9.4 pillar_1.6.0 geneplotter_1.68.0
## [73] XML_3.99-0.6 glue_1.4.2 evaluate_0.14
## [76] vctrs_0.3.7 gtable_0.3.0 openssl_1.4.3
## [79] purrr_0.3.4 assertthat_0.2.1 cachem_1.0.4
## [82] xfun_0.22 xtable_1.8-4 viridisLite_0.4.0
## [85] survival_3.2-10 tibble_3.1.0 GenomicAlignments_1.26.0
## [88] memoise_2.0.0 ellipsis_0.3.1